Search for: All records

Creators/Authors contains: "Liu, Yifei"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Enhanced File System Testing through Input and Output Coverage

https://doi.org/10.1145/3757347.3759138

Liu, Yifei; Kuenning, Geoff; Parvez, Md Kamal; Smolka, Scott A; Zadok, Erez (September 2025, ACM)

Effective file system testing relies on coverage to detect bugs and enhance reliability. We analyzed real file system bugs and found a weak correlation between code coverage, the most commonly used metric, and test effectiveness; many bugs were in covered code but remained undetected. Our study also showed that covering diverse file system inputs and outputs—system call arguments and return values—can be key to detecting the majority of observed bugs. We present input coverage and output coverage as new metrics for evaluating and improving file system testing, and have developed the IOCov framework for computing these metrics. Unlike existing system call tracers, IOCov computes coverage using only the calls relevant to testing, excluding unrelated ones that should not be counted. To demonstrate IOCov’s utility, we used it to extend the existing testing tool CrashMonkey into CM-IOCov, which achieves broader input coverage and more thorough detection of crash consistency bugs. Our experimental evaluation shows that IOCov com- putes input and output coverage accurately with minimal overhead. IOCov is applicable to different types of file system testing and can provide insights for improvement as well as identify untested cases based on coverage results. Moreover, the bugs found exclusively by CM-IOCov are 2.1 and 12.9 times more than those found exclusively by CrashMonkey on the 6.12 and 5.6 kernels, respectively, demonstrating the effectiveness of the IOCov-based coverage approach.
more » « less
Free, publicly-accessible full text available September 8, 2026
Kneeliverse: A universal knee-detection library for performance curves

https://doi.org/10.1016/j.softx.2025.102161

Antunes, Mário; Estro, Tyler; Bhandari, Pranav; Gandhi, Anshul; Kuenning, Geoff; Liu, Yifei; Waldspurger, Carl; Wildani, Avani; Zadok, Erez (May 2025, SoftwareX)

Identifying knee and elbow points in performance curves is a critical task in various domains, including machine learning and system design. These points represent optimal trade-offs between cost and performance, facilitating efficient decision-making and resource allocation. However, accurately determining the knees and elbows in curves poses a significant challenge. To address this challenge, we introduce Kneeliverse, an open-source library dedicated to knee/elbow point detection. Kneeliverse incorporates a suite of well-established knee-detection algorithms, including Menger, L-method, Kneedle, and DFDT. Additionally, Kneeliverse extends these algorithms to detect multiple knees and elbows in complex curves, employing a recursive approach. Kneeliverse further includes Z-Method, a recently developed algorithm specifically designed for multi-knee detection.
more » « less
Free, publicly-accessible full text available May 1, 2026
ORTEGA v1.0: an open-source Python package for context-aware interaction analysis using movement data

https://doi.org/10.1186/s40462-024-00460-2

Su, Rongxiang; Liu, Yifei; Dodge, Somayeh (December 2024, Movement Ecology)

Abstract BackgroundInteraction analysis via movement in space and time contributes to understanding social relationships among individuals and their dynamics in ecological systems. While there is an exciting growth in research in computational methods for interaction analysis using movement data, there remain challenges regarding reproducibility and replicability of the existing approaches. The current movement interaction analysis tools are often less accessible or tested for broader use in ecological research. To address these challenges, this paper presents ORTEGA, an Object-oRiented TimE-Geographic Analytical tool, as an open-source Python package for analyzing potential interactions between pairs of moving entities based on the observation of their movement. ORTEGA is developed based on one of the newly emerged time-geographic approaches for quantifying space-time interaction patterns among animals. A case study is presented to demonstrate and evaluate the functionalities of ORTEGA in tracing dynamic interaction patterns in animal movement data. Besides making the analytical code and data freely available to the community, the developed package also offers an extension of the existing theoretical development of ORTEGA for incorporating a context-aware ability to inform interaction analysis. ORTEGA contributes two significant capabilities: (1) the functions to identify potential interactions (e.g., encounters, concurrent interactions, delayed interactions) from movement data of two or more entities using a time-geographic-based approach; and (2) the capacity to compute attributes of potential interaction events including start time, end time, interaction duration, and difference in movement parameters such as speed and moving direction, and also contextualize the identified potential interaction events.
more » « less
Full Text Available
Analyzing tiger interaction and home range shifts using a time-geographic approach

https://doi.org/10.1186/s40462-024-00454-0

Liu, Yifei; Dodge, Somayeh; Simcharoen, Achara; Ahearn, Sean C; Smith, James_L D (December 2024, Movement Ecology)

Abstract BackgroundInteraction through movement can be used as a marker to understand and model interspecific and intraspecific species dynamics, and the collective behavior of animals sharing the same space. This research leverages the time-geography framework, commonly used in human movement research, to explore the dynamic patterns of interaction between Indochinese tigers (Panthera tigris corbeti) in the western forest complex (WEFCOM) in Thailand. MethodsWe propose and assess ORTEGA, a time-geographic interaction analysis method, to trace spatio-temporal interactions patterns and home range shifts among tigers. Using unique GPS tracking data of tigers in WEFCOM collected over multiple years, concurrent and delayed interaction patterns of tigers are investigated. The outcomes are compared for intraspecific tiger interaction across different genders, relationships, and life stages. Additionally, the performance of ORTEGA is compared to a commonly used proximity-based approach. ResultsAmong the 67 tracked tigers, 42 show concurrent interactions at shared boundaries. Further investigation of five tigers with overlapping home ranges (two adult females, a male, and two young male tigers) suggests that the mother tiger and her two young mostly stay together before their dispersal but interact less post-dispersal. The male tiger increases encounters with the mother tiger while her young shift their home ranges. On another timeline, the neighbor female tiger mostly avoids the mother tiger. Through these home range dynamics and interaction patterns, we identify four types of interaction among these tigers: following, encounter, latency, and avoidance. Compared to the proximity-based approach, ORTEGA demonstrates better detects concurrent mother–young interactions during pre-dispersal, while the proximity-based approach misses many interactions among the dyads. With larger spatial buffers and temporal windows, the proximity-based approach detects more encounters but may overestimate the duration of interaction. ConclusionsThis research demonstrates the applicability and merits of ORTEGA as a time-geographic based approach to animal movement interaction analysis. We show time geography can develop valuable, data-driven insights about animal behavior and interactions. ORTEGA effectively traces frequent encounters and temporally delayed interactions between animals, without relying on specific spatial and temporal buffers. Future research should integrate contextual and behavioral information to better identify and characterize the nature of species interaction.
more » « less
Full Text Available
Biomimetic Hierarchies for Universal Surface Enhancement and Applications in Water Treatment

https://doi.org/10.1021/acsami.4c10548

Liu, Yifei; Roy, Ajit K; Fan, Donglei Emma (October 2024, ACS Applied Materials & Interfaces)

Full Text Available
Novel Uncertainty Quantification through Perturbation-Assisted Sample Synthesis

https://doi.org/10.1109/TPAMI.2024.3393364

Liu, Yifei; Shen, Rex; Shen, Xiaotong (January 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
Lee, Kyoung Mu (Ed.)
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy. By incorporating knowledge transfer from large pretrained generative models, PASS enhances estimation accuracy, yielding refined distributional estimates of various statistics via Monte Carlo experiments. On the other hand, PAI boasts its statistically guaranteed validity. In pivotal inference, it enables precise conclusions even without prior knowledge of the pivotal’s distribution. In non-pivotal situations, we enhance the reliability of synthetic data generation by training it with an independent holdout sample. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
more » « less
Full Text Available
Accelerating multi-tier storage cache simulations using knee detection

https://doi.org/10.1016/j.peva.2024.102410

Estro, Tyler; Antunes, Mário; Bhandari, Pranav; Gandhi, Anshul; Kuenning, Geoff; Liu, Yifei; Waldspurger, Carl; Wildani, Avani; Zadok, Erez (May 2024, Performance Evaluation)

Storage cache hierarchies include diverse topologies, assorted parameters and policies, and devices with varied performance characteristics. Simulation enables efficient exploration of their configuration space while avoiding expensive physical experiments. Miss Ratio Curves (MRCs) efficiently characterize the performance of a cache over a range of cache sizes, revealing ‘‘key points’’ for cache simulation, such as knees in the curve that immediately follow sharp cliffs. Unfortunately, there are no automated techniques for efficiently finding key points in MRCs, and the cross-application of existing knee-detection algorithms yields inaccurate results. We present a multi-stage framework that identifies key points in any MRC, for both stack- based (e.g., LRU) and more sophisticated eviction algorithms (e.g., ARC). Our approach quickly locates candidates using efficient hash-based sampling, curve simplification, knee detection, and novel post-processing filters. We introduce Z-Method, a new multi-knee detection algorithm that employs statistical outlier detection to choose promising points robustly and efficiently. We evaluated our framework against seven other knee-detection algorithms, identifying key points in multi-tier MRCs with both ARC and LRU policies for 106 diverse real-world workloads. Compared to naïve approaches, our framework reduced the total number of points needed to accurately identify the best two-tier cache hierarchies by an average factor of approximately 5.5x for ARC and 7.7x for LRU. We also show how our framework can be used to seed the initial population for evolutionary algorithms. We ran 32,616 experiments requiring over three million cache simulations, on 151 samples, from three datasets, using a diverse set of population initialization techniques, evolutionary algorithms, knee-detection algorithms, cache replacement algorithms, and stopping criteria. Our results showed an overall acceleration rate of 34% across all configurations.
more » « less
Full Text Available
Metis: File System Model Checking via Versatile Input and State Exploration

Liu, Yifei; Adkar, Manish; Holzmann, Gerard; Kuenning, Geoff; Liu, Pei; Smolka, Scott; Su, Wei; Zadok, Erez (February 2024, USENIX)

We present Metis, a model-checking framework designed for versatile, thorough, yet configurable file system testing in the form of input and state exploration. It uses a nondeterministic loop and a weighting scheme to decide which system calls and their arguments to execute. Metis features a new abstract state representation for file-system states in support of efficient and effective state exploration. While exploring states, it compares the behavior of a file system under test against a reference file system and reports any discrepancies; it also provides support to investigate and reproduce any that are found. We also developed RefFS, a small, fast file system that serves as a reference, with special features designed to accelerate model checking and enhance bug reproducibility. Experimental results show that Metis can flexibly generate test inputs; also the rate at which it explores file-system states scales nearly linearly across multiple nodes. RefFS explores states 3–28x faster than other, more mature file systems. Metis aided the development of RefFS, reporting 11 bugs that we subsequently fixed. Metis further identified 12 bugs from five other file systems, five of which were confirmed and with one fixed and integrated into Linux.
more » « less
Full Text Available
Metis: File System Model Checking via Versatile Input and State Exploration

Liu, Yifei; Adkar, Manish; Holzmann, Gerard; Kuenning, Geoff; Liu, Pei; Smolka, Scott; Su, Wei; Zadok, Erez (February 2024, Usenix FAST 2024)

We present Metis, a model-checking framework designed for versatile, thorough, yet configurable file system testing in the form of input and state exploration. It uses a nondeterministic loop and a weighting scheme to decide which system calls and their arguments to execute. Metis features a new abstract state representation for file-system states in support of efficient and effective state exploration. While exploring states, it compares the behavior of a file system under test against a reference file system and reports any discrepancies; it also provides support to investigate and reproduce any that are found. We also developed RefFS, a small, fast file system that serves as a reference, with special features designed to accelerate model checking and enhance bug reproducibility. Experimental results show that Metis can flexibly generate test inputs; also the rate at which it explores file-system states scales nearly linearly across multiple nodes. RefFS explores states 3–28× faster than other, more mature file systems. Metis aided the development of RefFS, reporting 11 bugs that we subsequently fixed. Metis further identified 12 bugs from five other file systems, five of which were confirmed and with one fixed and integrated into Linux.
more » « less
Full Text Available
Input and Output Coverage Needed in File System Testing

https://doi.org/10.1145/3599691.3603405

Liu, Yifei; Ahuja, Gautam; Kuenning, Geoff; Smolka, Scott; Zadok, Erez (July 2023, The 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage '23))

File systems need testing to discover bugs and to help ensure reliability. Many file system testing tools are evaluated based on their code coverage. We analyzed recently reported bugs in Ext4 and BtrFS and found a weak correlation between code coverage and test effectiveness: many bugs are missed because they depend on specific inputs, even though the code was covered by a test suite. Our position is that coverage of system call inputs and outputs is critically important for testing file systems. We thus suggest input and output coverage as criteria for file system testing, and show how they can improve the effectiveness of testing. We built a prototype called IOcov to evaluate the input and output coverage of file system testing tools. IOcov identified many untested cases (specific inputs and outputs or ranges thereof) for both CrashMonkey and xfstests. Additionally, we discuss a method and associated metrics to identify over- and under-testing using IOcov.
more » « less
Full Text Available

« Prev Next »